Combination of words and word categories in varigram histories

نویسنده

  • Reinhard Blasig
چکیده

This paper presents a new kmd of language models: caregor@vord varigrums. This special model type permits a tight integration of word-based and category-based modeling of word sequences. Any succession of words and word categones may be employed to descnbe a given word history. This provides a much greater flexibtlity than previous combinations of word-based and category-based language models. Expenments on the WSJO corpus and the 1994 ARPA evaluation data indicate that the category/word vangram yields a perplexity reduction of up to 10 percent as compared to a word vangram of the same size. and improves the word error rate (WER) by 7 percent. Compared to a linear interpolation of a word-based and a category-based n-gram, the WER improvement is about 4 percent.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Compact n-gram models by incremental growing and clustering of histories

This work concerns building n-gram language models that are suitable for large vocabulary speech recognition in devices that have a restricted amount of memory and space available. Our target language is Finnish, and in order to evade the problems of its rich morphology, we use sub-word units, morphs, as model units instead of the words. In the proposed model we apply incremental growing and cl...

متن کامل

Compact n-gram models by incremental g

This work concerns building n-gram language models that are suitable for large vocabulary speech recognition in devices that have a restricted amount of memory and space available. Our target language is Finnish, and in order to evade the problems of its rich morphology, we use sub-word units, morphs, as model units instead of the words. In the proposed model we apply incremental growing and cl...

متن کامل

Adaptive topic - dependent language modelling using word - based varigrams

This paper presents two extensions of the standard interpolated word trigram and cache model, namely the extension of the trigram model by useful word m{grams with m > 3 resulting into a varigram model , and the addition of topic{speciic trigram models. We give the criteria for selecting useful m{grams and for partitioning the training corpus into topic{ speciic subcorpora. We apply both extens...

متن کامل

Adaptive Topic { Dependent Language

This paper presents two extensions of the standard interpolated word trigram and cache model, namely the extension of the trigram model by useful word m{grams with m > 3 resulting into a varigram model , and the addition of topic{speciic trigram models. We give the criteria for selecting useful m{grams and for partitioning the training corpus into topic{ speciic subcorpora. We apply both extens...

متن کامل

English Vocabulary for Equine Veterans: How Different from GSL and AWL Words

ESP students are usually suggested to master general and academic word lists such as Wests’ (1953) General Service List (GSL) and Coxhead’s (2000) Academic Word List (AWL) to be able to read their academic texts. However, it seems that university students may not need to learn all the words in the two lists as some words in the lists are of less frequency in academic texts. Moreover, there are ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999